Mapping and masking are two important speech enhancement methods based on deep learning that aim to recover the original\nclean speech from corrupted speech. In practice, too large recovery errors severely restrict the improvement in speech quality. In\nour preliminary experiment, we demonstrated that mapping and masking methods had different conversion mechanisms and\nthus assumed that their recovery errors are highly likely to be complementary. Also, the complementarity was validated accordingly.\nBased on the principle of error minimization, we propose the fusion between mapping and masking for speech\ndereverberation. Specifically, we take the weighted mean of the amplitudes recovered by the two methods as the estimated\namplitude of the fusion method. Experiments verify that the recovery error of the fusion method is further controlled. Compared\nwith the existing geometric mean method, the weighted mean method we proposed has achieved better results. Speech dereverberation\nexperiments manifest that the weighted mean method improves PESQ and SNR by 5.8% and 25.0%, respectively,\ncompared with the traditional masking method.
Loading....